De Novo Genome Assembly    ◾    105

If the installation is successful, you can run the following to display the help:

busco –help

BUSCO databases include ortholog databases for several clades of organisms. Before using

BUSCO, you may need to identify the database to use for the assessment. The database list

can be displayed using the following command:

busco --list-datasets

Now, we can use BUSCO to assess the three E. coli assemblies (one generated with ABySS

and two generated by SPAdes). We can save the output of each assessment in a separate

directory.

busco \

-i abyss_ecoli_ass.fasta \

-o abyss_ecoli_ass.out \

-l bacteria \

-m genome

busco \

-i spades_ecoli_ass.fasta \

-o spades_ecoli_ass.out \

-l bacteria \

-m genome

busco \

-i spades_hyb_ecoli_ass.fasta \

-o spades_hyb_ecoli_ass.out \

-l bacteria \

-m genome

The BUSCO assessment output for each assembly will be saved in a separate directory:

“abyss_ecoli_ass.out”, “spades_ecoli_ass.out”, and “spades_hyb_ecoli_ass.out”. Each of

these directories includes an assessment report as a text file and JSON file, in addition to

subdirectories for the predicted genes and used ortholog database.

Comparing between the three assemblies based on BUSCO assessment metrics (Figures

3.13–3.15), the two assemblies generated by SPAdes are better than the one generated by

ABySS. A total number of 4085 genomes and 124 genes were used to extract informed

expected information. The E. coli assembled by SPAdes shows 100% completeness (C:100%),

no duplicate (D:0.0%), no fragments (F:0.0%), no missing gene (M:0.0%) out of the 124

genes, whereas the BUSCO assessment report for the assembly generated by SPAdes shows

C:98.4% [S:98.4%, D:0.0%], F:1.6%, M:0.0%, n:124, which indicates 98.4% of completeness

(122 genes are recovered), 1.6% of fragment (2 partially recovered genes).

Combining both statistical and evolutionary assessment for the de novo assembly will

provide a good idea about the quality of the de novo assembled genome.